Receiving a natural language request and retrieving a personal voice memo

ABSTRACT

A computer-implemented method is provided. The method includes receiving commands to store memos, identifying subjects related to the memos, storing, in a database, the memos, their related subjects, and associated time information, receiving a natural language request to retrieve a memo, the request having query information, identifying a subject related to the request, responsive to the request, querying the database for memos related to the subject, identifying multiple memos in response to the database query, identifying a memo, from the multiple identified memos, that has the most recent associated time information and providing a response in dependence on the identified memo.

RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.16/255,674, entitled “Using A Virtual Assistant To Store A PersonalVoice Memo And To Obtain A Response Based On A Stored Personal VoiceMemo That Is Retrieved According To A Received Query”, filed on Jan. 23,2019, naming inventors Mara Selvaggi, Irina A. Spiridonova and KarlStahl, the application of which is hereby incorporated by reference.

BACKGROUND

Existing note-taking applications, such as Evernote® and Simplenote®,allow users to write notes using a manual input modality. However, suchapplications do not record memos, play back memos or play backintelligent interpretations of memos using a spoken modality.

Some voice memo applications, such as Zoho Notebook® and Voice Memos®for iOs®, allow users to record and play back memos, starting andstopping using a manual modality (see submitted non-patent literature“Zoho”). However, such applications do not support explicit or implicitsearching for information in memos or retrieving information from thememos using voice modalities.

Conventional smart-speaker virtual assistants allow storing andretrieving information using voice in limited ways. For example, GoogleAssistant® and Siri® can add and retrieve events from a cloud-storedcalendar. However, using the feature requires the user to carefullyspecify the content and the requests precisely to make the system dowhat is desired. For example, if a user asks Siri® “When is my husband'sbirthday?” and that information has not been pre-set in that user'sdevice or device ecosystem, Siri® willy reply “I don't know who yourhusband is.”

Cardona® teaches, at a high level, how to use various current commercialvirtual assistants to store any arbitrary voice notes (see submittednon-patent literature “Cardona”). All systems implemented by Cardona®essentially transcribe speech to text that users can only retrievethrough a visual modality. Prior art systems do not allow even for asystem to read back, using text-to-speech notes or a summary of notesusing speech. Doing such without significantly wasting the time of auser listening to extraneous neighboring words and irrelevantinformation is a non-trivial and unsolved problem.

Voicera® describes the existence, without enablement, of summarizationof voice notes (see submitted non-patent literature “Voicera”). However,Voicera® still relies on a visual modality for reviewing information anddoes not address the problem of providing relevant information forusers, using a speech modality, without wasting time with extraneousneighboring words and irrelevant information.

U.S. Patent Application Publication No. 2006/0064411 A1 with title“Search engine using user intent” filed by Gross, et al., teaches asystem for searching with results ranked based, in part, on past useractivity. However, it does not use natural language and is notapplicable to conversational voice search. Also, it does not provide fora user to explicitly retrieve stored information.

U.S. Pat. No. 6,675,159 B 1 with title “Concept-based search andretrieval system” issued to Lin, et al., teaches a system fornatural-language-based retrieving of multimedia information stored withappropriate attribute metadata. However, the system only addressesretrieving multimedia information. It does not teach retrieval ofinformation used to complete the interpretations or respond verbally tonatural language queries.

The submitted non-patent literature “Kolodner” teaches a specificspeed-and-storage efficient method for storing and organizing facts fornatural-language-based storage and retrieval. It is limited to a singledomain of knowledge and would not be practical to implement for anyarbitrary domains or conversation topics.

U.S. Patent Application Publication No. 2014/0365222 A1 with title“Mobile systems and methods of supporting natural language human-machineinteractions” filed by Weider teaches a method of storage and retrievalof personal information, such as user profile and environmentalinformation. However, it does not extract information fromconversational natural language expressions, and it does not filter forparticular relevant information to retrieve for interpreting andresponding to later natural language requests.

Thus, a need arises for speech recognition technology that is capable ofrecording voice memorandums (i.e., memos), intelligently storing thememos along with information derived from the memos, and intelligentlyretrieving information contained in or derived from the stored memos.

Additionally, voice-enabled virtual assistants currently do not have thecapability to intelligently learn the preferences or favorites of a userand then later use that information to answer a question from the user.For example, Siri® does not learn a person's preferences or favorites inan intelligent manner. Specifically, when a user asks Siri® “What is myfavorite restaurant?” Siri® thinks that the user is asking about Siri'spreference and a response is provided to the user as “I don't eat outthat much.” Furthermore, when other virtual assistants are asked “Whatis my favorite restaurant?” they pick a restaurant that has the word“favorite” in its name, such as “My Favorite Cafe.” The Google Maps®application has an option to add places to a “Favorites” list, a “Wantto Go” list or a “Starred Places” list, but it does not allow thoselists to be queried using one's own voice. Google Assistant® has afeature of remembering a favorite place; however, it is able to storeonly a limited number of places and doesn't allow users to reliablyquery them (e.g., give directions to that place). For example, a GoogleAssistant® (GA) interaction goes as follows: (i) user: “do you know whatmy favorite restaurant is?”; (ii) GA: “I don't know that yet. What'syour favorite restaurant?”; (iii) user: “my favorite restaurant is RedLobster,” (iv) GA: “OK, I'll remember that”; (v) user: “do you know whatis my favorite beach?”; (vi) GA: “I remember you told me. ‘My favoriterestaurant is Red Lobster’.”; (vii) user: “can you give me directions tomy favorite restaurant?”; and (viii) GA: “Here you go. Directions fromyour location to IHOP . . . .” As is clear from the prior art, there ismuch needed improvement with respect to incorporating a user'spreferences or favorites into a voice-enabled virtual assistant.

Accordingly, an additional need arises for speech enabled virtualassistants that intelligently store favorite information of a user forsubsequent retrieval and presentation to the user at the appropriatetime.

SUMMARY

The technology disclosed relates to (i) speech enabled virtualassistants implementing technology that is capable of recording voicememorandums (i.e., memos), intelligently storing the memos along withinformation derived from the memos, and intelligently retrievinginformation contained in or derived from the stored memos and (ii)speech enabled virtual assistants implementing technology thatintelligently stores favorite information of a user for subsequentretrieval and presentation to the user at the appropriate time.

Regarding the recording, storage and retrieving of memos, the technologydisclosed receives (by a virtual assistant) a natural language utterancethat includes memo information, interprets the received utteranceaccording to a natural language grammar rule associated with a memodomain and stores (in a database) a memo that is derived from theinterpretation of the memo information, receives another naturallanguage utterance expressing a request (i.e., a request to query memodata from the database), interprets the natural language utteranceexpressing a request according to a natural language grammar rule forretrieving memo data from the natural language utterance, such that thenatural language rule for retrieving memo data recognizes queryinformation, in response to a successful interpretation of the naturallanguage utterance, uses the recognized using the recognized queryinformation to query the database for specific memo data related to therecognized query information, and provides, to the user, a responsegenerated in dependence upon the queried-for specific memo data.

Regarding the storing and retrieval of favorite information, thetechnology disclosed operates in a similar manner as the storing andretrieval of memos.

Particular aspects of the technology disclosed are described in theclaims, specification and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology that iscapable of receiving a request or query and intelligently retrievinginformation contained in or derived from previously stored memos.

FIG. 2 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology that iscapable of recording voice memorandums (i.e., memos) and intelligentlystoring the memos along with information derived from the memos.

FIG. 3 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology thatintelligently retrieves and presents favorite information of a usercontained in or derived from previously identified and stored favorites.

FIG. 4 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology that iscapable of receiving favorites and intelligently storing the favoritesalong with information derived from the favorites.

FIGS. 5A, 5B and 5C show three examples implementations of thetechnology disclosed using different types of virtual assistants.

FIG. 6 illustrates shows an overhead view of an automobile designed toimplement the technology disclosed.

FIG. 7 illustrates an example environment in which personal memos and/orfavorites can be stored, search and retrieved for generation ofintelligent responses using the technology disclosed.

FIG. 8 is a block diagram of an example computer system that canimplement various components of the environment of FIG. 7.

FIG. 9 illustrates TABLE 1, which includes example phrases that wouldtrigger the storing of a personal memo.

FIG. 10 illustrates TABLE 2, which includes example phrases that wouldtrigger the storing of a personal memo.

FIG. 11 illustrates TABLE 3, which includes example ways of invoking thestoring of favorite information, querying favorite information andpossible responses from a virtual assistant.

FIG. 12 illustrates TABLE 4, which includes example ways of usingfavorite information for obtaining directions and travel information.

FIG. 13 illustrates TABLE 5, which includes example ways of storingmultiple favorites for a specific category and then later obtainingspecific information for both of the favorites in the same category orobtaining favorite information of multiple favorites based ongeographical location.

DETAILED DESCRIPTION

The following detailed description is made with reference to thefigures. Example implementations are described to illustrate thetechnology disclosed, not to limit its scope, which is defined by theclaims. Those of ordinary skill in the art will recognize a variety ofequivalent variations on the description that follows.

Examples of Voice Memorandums

An aspect of the technology disclosed relates to speech-enabled virtualassistants implementing recognition technology that is capable ofrecording voice memorandums (i.e., memos, or personal memos),intelligently storing the memos along with information derived from thememos, and intelligently retrieving information contained in or derivedfrom the stored memos. Two specific examples of this speech recognitiontechnology that is capable of recording and intelligently storing memosand related information and retrieving information in dependence uponthe stored memos are provided below.

The first example relates to cooking lasagna. The scenario is that justabout every recipe on the internet indicates that lasagna should becooked for 40 minutes. However, a particular user has determined thatwith their oven 40 minutes is too much, and as a result, their lasagnais always burned. The user was able to determine through experience thatthe perfect cooking time for their lasagna is 30 minutes. In order toremember that the perfect time for cooking lasagna in their oven is 30minutes, the user will have an interaction with a virtual assistant (orsome other type of technology that is capable of speech recognition andfeedback) as follows (note that only the text in italics is the voiceexchange or interaction with the virtual assistant; and the virtualassistant is named Hound):

(i) User: “Ok Hound. To get a perfect lasagna, I cook it in the oven for30 minutes.” [this phrase uttered from the user was identified by thevirtual assistant as being related to a memo or a memo domain independence upon the virtual assistant identifying the trigger words “I”a personal pronoun and “cook” a verb)].

(ii) User: “Ok Hound. How long should I cook the lasagna?” [this phraseuttered from the user was identified by the virtual assistant as beingrelated to querying a memo or a memo domain in dependence upon thevirtual assistant identifying a request (e.g., an interrogatory) andtrigger words such as “I” a personal pronoun and “cook” a verb)].

(iii) Hound: “You should cook the lasagna 30 minutes in the oven.” [thisresponse from the virtual assistant was generated by obtaining thestored memo or information relating to the memo that indicated thecooking time in the oven for lasagna is 30 minutes].

The second example relates to finding or locating lost objects. Thescenario is that a user places an object somewhere (e.g., for hiding orstorage), where the user wants to be sure to remember where the objectwas placed. Instead of writing a text, email or physical message tooneself, the user would have the following interaction with the virtualassistant.

(i) User: “Hound, remember that I put the car key in my brown bag.”[this phrase uttered from the user could be identified by the virtualassistant as being related to a memo or memo domain in dependence uponthe virtual assistant identifying the wake phrase “Hound, remember.”]

(ii) User: “Ok Hound. Where did I put my car key?”

(iii) Hound: “You put your car key in your brown bag.”

Examples of Favorites

Another aspect of the technology disclosed relates to speech enabledvirtual assistants implementing technology that intelligently storesfavorite information of a user for subsequent retrieval and presentationto the user at the appropriate time. A concept is that the favoriteinformation of the user is stored, such as favorite restaurants, grocerystores, beauty salons, gyms, recreation spots, parking garages, friendsand family, etc. and then later used to answer inquiries from the user.Three specific examples of this technology that is capable of recordingand intelligently storing memos and related information and retrievinginformation in dependence upon the stored memos are provided below.

The first example relates to favorite places and the scenario is thatthe user tells the virtual assistant about a favorite restaurant andthen later on asks for directions to that restaurant.

(i) User: “Ok Hound, my favorite restaurant is Spice Me at Half MoonBay.” [this information conveyed from the user, triggered “favorites” ora favorites domain and in particular a favorite restaurant in dependenceupon the trigger words “favorite” and “restaurant.”].

(ii) User: “Ok Hound, give me directions to my favorite restaurant.”

(iii) Hound: “Here you are . . . ” (and directions are provided to theuser in one of various forms, such as spoken word, opening up a map ordirections application, etc.).

The second example relates to a routine commute and the scenario is thatthe user goes to the same gym, bar, grocery store etc. on a regularbasis, so she tells the virtual assistant to remember this particularplace as a favorite for later retrieval.

(i) User: “Ok Hound, the gym I usually go to is Orange Theory Fitness®in Santa Clara.” [favorites or favorites domain is triggered by thewords “I” and “usually”].

(ii) User: “Ok Hound, how long will it take me to get to the gym?”

(iii) Hound: “It will take you 15 minutes to get to the gym.” [thevirtual assistant utilizes the information of the user's favorite gym todetermine which gym the user is referring to and then estimate how longit will take to get there using the typical transportation scheme usedby the user to get to the gym in view of present traffic conditions].

The third example relates to making recommendations and the scenario isthat a user asks a virtual assistant for a recommendation, where theuser has previously given the virtual assistant some information aboutfavorite restaurants, etc. or perhaps where the user has not previouslyprovided favorite information.

(i) User: “Ok Hound, give me a restaurant recommendation.”

(ii) Hound: “Tell me what kind of food you like.”

(iii) User: “I like Thai Food and Italian food the most.”

(iv) User: “Ok Hound, are there any restaurants around I might like?”

(v) Hound: “I have two restaurants that are close by that serve yourfavorite types of food but based on the fact that you recently had Thaifood I will recommend Pasta Moon Italian Restaurant at Half Moon Bay.”

Discussion of the Figures

Now, turning the figures, various example aspects of the technologydisclosed are provided below.

FIG. 1 illustrates a block diagram of an example environment 100 capableof speech enabled virtual assistants implementing technology that iscapable of receiving a request or query and intelligently retrievinginformation contained in or derived from previously stored memos. Theterm “intelligently retrieving” is mentioned because the environment100, as discussed in further detail below, is capable of not justrepeating a previous statement made by the user but is able to derive amore useful response to the user, as a result of having previouslystored a memo or personal memo provided by the user.

In particular, FIG. 1 illustrates that the example environment 100includes a speech input 102 being received from a microphone or someother type of input device (e.g., an application running on a mobilephone or tablet, etc.). The speech input 102 includes search or queryrequest 103 (hereinafter query 103). The query 103 can be in the form ofa natural language utterance spoken by the user.

The speech input 102 can be received by a virtual assistant (notillustrated) as query 103. Speech enabled virtual assistants will simplybe referred to herein as “virtual assistants” or a “virtual assistant.”A virtual assistant can be a device or an application residing on adevice, such as a smart phone, a watch, glasses, a television, anautomobile, etc. The virtual assistant is capable of interacting with auser using the user's speech and is capable of, for example, (i)providing information back to the user (e.g., an answer to a question),(ii) providing an actionary response (e.g., changing the thermostat orlocking the doors to an automobile) or (iii) storing information forlater retrieval (remotely or locally) or for increasing the knowledgebase of the virtual assistant. A virtual assistant can monitor sound(e.g., conversations) to listen for a wake phrase that engages thevirtual assistant and to listen to a trigger phrase uttered after thewake phrase that directs the virtual assistant (or any system incommunication with the virtual assistant) to a particular domain. A wakephrase can be just one word or multiple words and a trigger phrase canbe just one word or multiple words.

Referring back to FIG. 1, the query 103 will be transcribed by thevirtual assistant (or a system connected to the virtual assistant asdescribed below with respect to FIG. 7) in operation 106. Next, inoperation 106, text obtained from the transcriptions of the query 103will be used to determine whether or not the user intended to query aparticular domain, such as a memo domain 108. If the memo domain 108 isidentified, then the text obtained from the transcriptions will beinterpreted using a particular grammar rule.

Regarding domains and grammar rules, a domain represents a particularsubject area, and comprises or is associated with a specific grammarrule. A specific grammar rule is not necessarily one single rule but canbe a set of rules that are suited to interpret a transcription of anatural language utterance that is related to a specific domain. Theprocess of interpreting a natural language utterance within a particulardomain produces exactly one interpretation. Different interpretationsarise when systems interpret a natural language utterance in the contextof different domains. Each interpretation represents the meaning of thenatural language utterance as interpreted by a domain. For example, whenusers make requests, such as asking “What time is it?” or directing thesystem to “Send a message.” Systems provide responses, such as byspeaking the time. Systems also make requests of users, such as byasking, “To whom would you like to send a message?”, and in reply, usersrespond, such as by replying, “Mom.” Sequences of one or more requestsand responses produce results such as sending a message or reporting thetime of day. The interactions regarding the “time” are interpreted, forexample, using a “time domain” with specific grammar rule that is suitedfor interpreting text related to time. The same for “messages,” whichimplement a “messages domain.” Sub-domains can also exist. The number ofdomains is limitless, as well as the specific grammar rules implementedby or included in the domains. These are merely non-limiting examples ofdomains, grammar rules, transcriptions and domains.

Turing back to FIG. 1, when the received natural language utteranceexpresses a request, the natural language utterance that expresses therequest can be interpreted according to a natural language grammar rulefor retrieving memo data. This rule is obtained from the memo domain108. Further, the natural language grammar rule is interpreted torecognize query information from the natural language utterance (e.g.query 103). As an example, in operation 106 the received naturallanguage utterance is “How long should I cook lasagna?”

Responsive to the interpretation and obtaining of the query information,an appropriate database will be searched or queried. According to oneaspect of the present invention, in operation 110 a memo transcriptiondatabase 112 can be queried using the interpreted natural languageutterance. The memo transcription database 112 includes text fromprevious natural language utterances directed to personal memos. Thememo transcription database 112 can be an unstructured or a structureddatabase storing unstructured or structured data. However, as previouslydiscussed, merely providing text back to a user that has not beeninterpreted according to specific domain would not be as helpful to theuser. An example of such text would be “To get a perfect lasagna, I cookit in the oven for 30 minutes.” This is just a simple transcription of apreviously stored or recorded personal memo (e.g., a word-for-wordrepeat of a transcription). While this is not a perfect answer to theuser's query, it still provides enough information. Additionally, theactual recording of the natural language utterance that expresses thequery 103 can be stored in another database, or even the memotranscription database 112 and/or the memo interpretation database 114.Further, the text stored in the memo transcription database 112 or therecording stored in another database can be stored for the purpose oflater re-interpretation. For example, grammar rules of domains can beimproved over time, therefore providing more accurate interpretations astime goes on. By storing the original text or recording that was used tocreate a first interpretation using the memo domain 108, it is possibleto re-interpret the original text or recording if the grammar rules havebeen improved upon.

According to another aspect of the present technology, in operation 110,a memo interpretation database 114 is queried using the interpretednatural language utterance. The memo interpretation database 114includes interpretations of natural language utterances directed topersonal memos. The memo interpretation database 114 can be anunstructured or a structured database storing unstructured or structureddata. Because the interpretations of the natural language utterances aremade using a particular natural language grammar rule associated withthe memo domain 108, the information stored and retrieved from the memointerpretation database 114 will be easier to search and provide moreaccurate and meaningful results. An example memo retrieved from the memointerpretation database 114 could be structured data, such as“cook.lasagna.oven.30-minutes” that can be used to generate a response,or an example memo retrieved from the memo interpretation database 114could already be in a form that is phrased as a natural languageresponse such as “Violet, you should cook your lasagna in your oven for30 minutes.”

After obtaining the memo from the memo transcription database 112 or thememo interpretation database 114 in operation 110, operation 118generates an appropriate answer (response) for the user. As discussedabove and in further detail below, an aspect of the technology disclosedis capable of providing a meaningful (appropriate) response to the userthat is not simply necessarily a word-for-word repeat of a previouslystored transcription, but something that is sufficient to and willactually be more helpful to answering the users request or query. Ifoperation 110 obtains the memo from the memo transcription database 112,then the memo can be further interpreted using the specific grammar rulefor retrieving memo data. For example, the retrieved memo “To get aperfect lasagna, I cook it in the oven for 30 minutes” could beinterpreted to generate a response such as “Violet, you should cook yourlasagna in your oven for 30 minutes.” If the memo retrieved from thememo interpretation database 114 is structured as“cook.lasagna.oven.30-minutes,” the system will generate “Violet, youshould cook your lasagna in your oven for 30 minutes,” as an appropriateresponse. Once the appropriate response or answer is generated inoperation 118, the appropriate response or answer will be provided tothe user in operation 120, in the form of speech 122 or message/text toa mobile device 124 or some other device similar thereto.

FIG. 2 illustrates a block diagram of an example environment capable ofspeech or text enabled virtual assistants implementing technology thatis capable of recording voice memorandums (i.e., memos) andintelligently storing the memos along with information derived from thememos.

Specifically, FIG. 2 illustrates an environment 200 that implements thestoring of a natural language utterance in the memo transcriptiondatabase 112 and/or the memo interpretation database 114. Theenvironment of FIG. 2 is very similar to that of FIG. 1, except that astatement 203 is received that causes the virtual assistant to storesome or all of the statement 203 as a memo as opposed to conducting aquery. Descriptions of redundant elements of FIG. 2 are omitted.

In operation 206 the statement 203 is transcribed and then a domain,such as the memo domain 108 is identified. Just as in FIG. 1, where thequery 103 is transcribed and interpreted, the text transcribed from thestatement 203 is interpreted using a specific grammar rule for storing amemo that is associate or included in the memo domain 108. For example,the natural language utterance (e.g., statement 203) received from theuser can be interpreted according to a natural language grammar rule forstoring memo data. In operation 210 the memo, obtained from thetranscription of the natural language, is stored as a transcription inthe memo transcription database 112 and in operation 212 the memo,obtained from an interpretation of the natural language utterance isstored in the memo interpretation database 114. Additionally, the actualrecording of the natural language utterance that expresses the statement203 can be stored in another database, or even the memo transcriptiondatabase 112 and/or the memo interpretation database 114. Thedifferences between transcriptions and interpretations and between thememo transcription database 112 and the memo interpretation database 114are described above in detail with reference to FIG. 1.

In operation 214 feedback is provided to the user in the form of speech122 or message/text to a mobile device 124 or some other device similarthereto. The speech can include a request for confirmation to the userto confirm whether or not they intended to store a personal memo, or aconfirmation to the user that the information has been stored as apersonal memo.

One aspect of the technology disclosed includes assigning a time periodto a memo after which the memo will expire and then removing the memo(or memo related information) from the memo transcription database 112and/or the memo interpretation database 114.

Another aspect of the technology disclosed includes interpreting thequery 103 and/or the statement 203 according to multiple domains (e.g.,multiple grammar rules), wherein each domain of the multiple domains hasan associated relevancy score for the interpreted utterance. The memodomain 108 is one domain of the multiple domains and the memo domain 108has an advantage over the other domains with respect to interpretingqueries and statements related to personal memos. As such, when any ofthe query 103 and/or the statement 203 is directed to a personal memo,the interpretation using the memo domain 108 will have the highestrelevance score as compared to the other domains. Additionally,different interpretations of the query 103 and/or the statement 203using the multiple domains can be stored in the memo interpretationdatabase 114.

The information stored in the memo interpretation database can be storedalong with additional information, such as meta-data or meta-informationthat describes the memo as pertaining to a short-term activity, dailyweather, and an until-event such as a child being at soccer practice,which is cancelled (or deleted) when the parent arrived and then leavesthe soccer field as a result of picking up the child. The meta-data ormeta-information can be explicitly stated by the user (e.g., “I'll be atwork until 5 pm”) or it can be inferred from other information obtainedfrom the user, such as other personal memos, other calendar informationor other routine information obtained from general tendencies of theuser.

Additional examples of storing personal memos and then retrievinginformation related to the stored personal memos are provided below.

Example Wake Phrases and Trigger Phrases for Storage and Retrieval

As mentioned above, virtual assistants or related devices often havewake phrases to indicate to the virtual assistant that the user isattempting to engage or use the virtual assistant. Assuming that thetechnology disclosed utilizes a standard wake phrase of “Ok Hound” toengage the virtual assistant. One way to indicate that a user'sutterance is intended to retrieve information from a stored personalmemo would be to assign specific wake phrases, such as “Ok Hound checkmy personal information for . . . ,” or “Hound check my memos forinformation regarding . . . ”. Further, one way to indicate that auser's utterance is intended to be stored as a personal memo would be toassign specific wake phrases, such as “Ok Hound memo,” “Hound memo” or“Ok Hound remember.” Each of these example wake phrases wouldimmediately indicate that the user is intending to retrieve or store apersonal memo. However, sometimes users have difficulty rememberingwhich wake phrases to use in which situation.

Accordingly, the technology disclosed is capable of determining whetheror not a natural language utterance received after a generic wake phraseincludes a specific trigger phrase to indicate that the user intends tosearch for a memo or store a memo. For the sake of simplicity, a“trigger phrase” can include just a single word or multiple words, and a“wake phrase” can include just a single word or multiple words. Use ofthe wake phrase and trigger phrase can be used to make the systemunderstand to record, store and retrieve the information to/from the“memo domain”. Additionally, weights on the “memo domain” can be invokedin order to make it the first domain (of multiple other domains) toconsider when retrieving information.

The trigger phrases can include personal pronouns, such as “I” (e.g.,“Where did I put the key?”, “How long do I usually cook Lasagna?”) orpossessives like “my” (e.g., “Where is my key?”). As another example, atrigger phrase may be identified as being an interrogative pronoun or arelative pronoun that is within 5 words of the personal pronoun, or atrigger phrase may be identified as being a personal pronoun followed byor preceded by an interrogative pronoun or a relative pronoun that iswithin 5 words of the personal pronoun. These are merely examples of thetypes of phrases that can be configured to indicate that the user isattempting to retrieve or store a personal memo.

Once the trigger phrase is identified, then the appropriate domain(e.g., memo domain 108) will be selected and an appropriate grammar rulecan also be selected in dependence upon the trigger phrase itself, othercontents of the natural language utterance or a combination of both.

Cooking Example

For each domain, it is possible to (i) determine and assign all of thepossible ways a user would store a personal memo, (ii) retrieveinformation from the stored personal memo and (iii) determine all of theways for the virtual assistant to respond to the user.

FIG. 9 illustrates TABLE 1, which includes example phrases that wouldtrigger the storing of a personal memo in the memo domain 108 or aspecific sub-domain (e.g., cooking) of the memo domain 108. There can bemultiple stages of complexity with respect to the virtual assistantunderstanding a request and providing an answer to the user. Differentstages could be implemented by the virtual assistant due to manyfactors, such as availability of processing, communication bandwidth,certainty of interpretations and content of personal memos.

Stage 1 examples require the stored memo and the query to be of asimilar nature and the response is similar in nature as well. This issomewhat of a one-to-one correlation of the stored memo, the request andthe response. This is the least complex of the stages, because theresponse is closely tied to the query. For example, the first example ofstage 1 the query states, “do I usually leave . . . in the oven.” andthe response states “you usually leave . . . in the oven.”.

Stage 2 examples allow for more information to be inferred from thestored memo and the query for the memo and allow for different answersto be derived from the stored memo. Note that the arrows on the firstrow of stage two indicate that the utterance used to invoke storage canbe queried using three different options and there are threepossibilities for response. In other words, each cell of stage 2 hasthree counterpart cells. Although the arrows do not indicate such due tospace constraints on TABLE 1, the same goes for the second and thirdrows of stage two. For example, the second row of stage 2, the user canstate “To get a perfect lasagna leave it for 30 minutes in the oven.”Now, this personal memo can be queried in, at least, three differentways. In our example here, let's say that the user initiates the queryusing the phrase “How many minutes should I cook lasagna?” This isdifferent than stage 1, because the virtual assistant has a broaderrange of potential queries that could result in finding a particularpersonal memo. The same goes for the response provided by the virtualassistant, such that a response to the query “How many minutes should Icook lasagna?” could be “You usually leave your lasagna in the oven for30 minutes.” as opposed to “you should cook your lasagna for 30minutes.” A particular response can be implemented by the virtualassistant based previous responses that have been successful and/orunsuccessful (e.g., due to the user's vocabulary, etc., certainresponses can be more successful than others.

Stage 3 is the most complex stage, because it allows for additionalinformation to be derived from the stored memo, not just the cookingtime. In the example for stage 3, the user most likely invoked thestorage of the memo with a statement directed to the length of time forcooking lasagna, without really thinking about later retrieving ananswer as to “where” the lasagna should be cooked. However, the virtualassistant identified at least two pieces of information from the memo,including the fact that the lasagna is cooked in the oven and that it iscooked for 30 minutes. Therefore, the virtual assistant can answer twodifferent types of questions, including those related to how long tocook the lasagna and those related to where the lasagna should becooked.

Lost Objects Example

FIG. 10 illustrates TABLE 2, which includes example phrases that wouldtrigger the storing of a personal memo in the memo domain 108 or aspecific sub-domain (e.g., object location) of the memo domain 108, aswell as ways to query the personal memo and possible responses from thevirtual assistant. TABLE 2 is different from TABLE 1, because TABLE 2also includes examples of grammar rules and sentence parsing that can beimplemented to store memos along with additional information and how thememo and additional information can be used to identify a query andstructure a response. As described in TABLE 2, each sentence used toinvoke storage of a memo is parsed to identify various components. Forexample, in the first row of TABLE 2, the virtual assistant identifiesthe personal pronoun “I” and then looks for a verb that is near the “I”.Here, any verb such as “put”, “am putting”, “'ll put” or “will put” thatfollows the “I” indicates to the virtual assistant that the utterancereceived from the user is related to the user putting an objectsomewhere. Continuing with this example, after the verb, the virtualassistant when looks for some variable (e.g., keys) that are likely tobe put somewhere. Next the virtual assistant looks for another variable(i.e., variable2) describing where variable1 is placed. Once thispersonal memo is stored with the additional information obtained fromparsing the utterance, the memo can be queried when the user asks aquestion including any variation of the verb “put” along with variable1(e.g., keys). Row 1 of TABLE 2 also describes the structure of theresponse with respect to the information included in the initialstatement from the user and the subsequent query.

Invoking User Feedback

The system may invoke user feedback to confirm whether or not a userintended to search for an answer based on a personal memo or to store apersonal memo. If the user indicates that they did not intend to query apersonal memo, then a different domain will be used to provide aresponse to the user's question. If the user indicates that they did notintend to store an utterance as a personal memo, then the personal memowill not be stored, or it will be deleted if it was stored. Theconfirmation requests to the user can be auditory or in the form of textand the user responses to the confirmation quests can be auditory or inthe form of text. Additionally, if the virtual assistant cannot locate amemo that provides an answer to the user's request, then the virtualassistant can ask for a clarification.

Dealing with Multiple Related Memos

A user can store and query multiple memos that are related to the samesubject. For example, a user may indicate that they put their keys in arefrigerator for safe keeping. Then at a later point the user mayindicate that they put their keys in their backpack. Now, when a userasks where their keys are located, the virtual assistant should be ableto indicate to the user that their keys are stored in their backpack.This scenario can be handled in many different ways. First, the virtualassistant may store each memo with time information and then make anassumption that when the user asks about the location of their keys, theuser is referring to the most recent memo about their keys. This isessentially time ordering all of the memos related to the location ofthe user's keys. By saving all of the memos regarding the location ofthe user's keys, the virtual assistant will be able to tell the userwhere they placed the keys before they were placed in the backpack. Thiswould be helpful if the user actually did not put them in the backpack.In this case, the user would probably find their nicely cooled keys inthe refrigerator. To accomplish this, a virtual assistant would parsesearch type statements to identify entities and attributes of theentities; search a database of memo information for the entity; and fordatabase records related to the entity, check for the most recent onerelating to the same attribute. In this example, the entity would bekeys and the attribute would be location.

A second option would to delete all previous memos relating to thelocation of the user's keys upon the storing of the most recent memoregarding the user's keys being in the backpack. To accomplish this, avirtual assistant would parse store type statements to identify entitiesand attributes; search a database for records about the same attributeof the same entity (only one should be found); delete the record; andstore a new record with the new information about the entity and itsattribute.

Additionally, the technology disclosed can understand when memos relateto changes in time. For example a user might say “Ok Hound, rememberthat I pick up my dog every day of the workweek at 5 pm from doggydaycare” (this is a memo related every Monday through Friday) or “OkHound, remember that today I pick up the dog at 4 pm from doggy hairsalon” (this is a memo related to a specific day). Specific triggerphrases that will help indicate these behaviors are “every day,”“today,” and “tomorrow”).

FIG. 3 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology thatintelligently retrieves and presents favorite information of a usercontained in or derived from previously identified and stored favorites.

Specifically, the environment 300 illustrated in FIG. 3, is similar tothe environment 100 of FIG. 1, except that the query 103 is directed toa favorites domain 308 for the purpose of obtaining information from afavorites transcription database 312 or a favorites interpretationdatabase 314. The favorites domain 308 is similar to the memo domain 108of FIG. 1, except that the favorites domain 308 has a different grammarrule for interpreting the query 103. Furthermore, the favoritestranscription database 312 stores transcriptions of previously storednatural language utterances related to “favorites” of a user and thefavorites interpretation database 314 stores interpretations of naturallanguages related to “favorites” of a user.

Generally, favorites are different from personal memos, because they areinherently narrower in scope and have a longer duration of relevance.Some example categories of favorites could be favorite types of food,grocery stores, hotels, friends, gymnasiums or recreation facilities,hair dressers, schools, colleges, sports teams, etc.

FIG. 4 illustrates a block diagram of an example environment capable ofspeech enabled virtual assistants implementing technology that iscapable of receiving favorites and intelligently storing the favoritesalong with information derived from the favorites.

The environment 400 of FIG. 4 is similar to the environment 200 of FIG.2, except that the statement 202 is (i) interpreted using the favoritesdomain 308, (ii) transcribed and stored in the favorites transcriptiondatabase 312 and (iii) interpreted for storage in the favoritesinterpretation database 314. All of the descriptions provided above withrespect to FIGS. 1 and 2 and memos, as provided above are applicable tothe storing and retrieval of favorites and information derived from thefavorites. For example, wake phrases, trigger phrases, etc., areapplicable to favorites. Additionally, a memo and/or memo relatedinformation can indicate that a specific entity is a favorite of theuser. Some examples of retrieving favorite information of the user andstoring information related to a user's favorite are discussed below.

FIG. 11 illustrates TABLE 3, which includes some example ways ofinvoking the storing of favorite information, querying favoriteinformation and possible responses from a virtual assistant.

FIG. 12 illustrates TABLE 4, which is similar to TABLE 3, except that itillustrates some example ways of using favorite information forobtaining directions and travel information.

FIG. 13 illustrates TABLE 5, which is similar to TABLE 4, except that itillustrates some example ways of using storing multiple favorites for aspecific category and then later obtaining specific information for bothof the favorites in the same category or obtaining favorite informationof multiple favorites based on geographic location.

Other example implementations of “favorites” can include building arecommendations table base on user's stored favorites. Here is anexample: (i) User: “I like Red Lobster® Restaurant”; (ii) VirtualAssistant: obtains information regarding Red Lobster Restaurant fromanother service, such as Yelp® (e.g., Seafood/Bar/Kids' menu/Casual &Cozy/3.9 stars/etc.); (iii) User “Are there any restaurants around hereI might like?”; (iv) Virtual Assistant: “There are other restaurants inthe area that have similar characteristics and ratings as your otherfavorites such as Fish Market Restaurant in San Mateo, would you like meto provide you with a full list of options?”

FIGS. 5 A, 5B and 5C show three example implementations of thetechnology disclosed using different types of virtual assistants. Forexample, FIG. 5A illustrates a mobile phone 502. Because mobile phonesare battery-powered, it is important to minimize complex computations soas not to run down the battery. Therefore, mobile phone 502 may connectover the Internet to a server. The mobile phone 502 has a visual displaythat can provide information in some use cases. However, the mobilephone 502 also has a speaker, and in some use cases the mobile phone 502may respond to an utterance using only speech.

FIG. 5B also illustrates a home assistant device 504, which may pluginto a stationary power source, so it has power to do more advancedlocal processing than the mobile phone 502. Like the mobile phone 502,the home assistant device 504 may rely on a cloud server forinterpretation of utterances according to specialized domains and inparticular domains that require dynamic data to form useful results.Because the home assistant device 504 has no display, it is aspeech-only device.

FIG. 5C illustrates an automobile 506. The automobile 506 may be able toconnect to the Internet through a wireless network. However, if drivenaway from an area with a reliable wireless network, the automobile 506must process utterances, respond, and give appropriate results reliably,using only local processing. As a result, the automobile 506 can runsoftware locally for natural language utterance processing. Though manyautomobiles have visual displays, to avoid distracting drivers indangerous ways, the automobile 506 may provide results with speech-onlyrequests and responses or may provide results to a display for onlynon-driving passengers to view and interact with.

FIG. 6 shows an overhead view of an automobile 600 designed to implementthe technology disclosed. The automobile 600 has two front seats 602,either of which can hold one person. The automobile 600 also has a backseat 604 that can hold several people. The automobile 600 has a driverinformation console 606 that displays basic information such as speedand energy level. The automobile 600 also has a dashboard console 608for more complex human interactions that cannot be quickly conducted byspeech, such as viewing and tapping locations on navigational maps.

The automobile 600 has side bar microphones 610 and a ceiling-mountedconsole microphone 612, all of which receive speech audio such that adigital signal processor embedded within the automobile can perform analgorithm to distinguish between speech from the driver or front-seatedpassenger. The automobile 600 also has a rear ceiling-mounted consolemicrophone 614 that receive speech audio from rear-seated passengers.

The automobile 600 also has a car audio sound system with speakers. Thespeakers can play music but also produce speech audio for spokenresponses to user commands and results. The automobile 600 also has anembedded microprocessor. It runs software stored on non-transitorycomputer-readable media that instruct the processor to perform some orall of the operations discussed with reference to the algorithm of FIGS.1-5, 7 and 8, among other functions.

FIG. 7 illustrates an example environment 700 in which personal memosand/or favorites (or information derived therefrom) can be stored,searched for retrieval and for generation of intelligent responses usingthe technology disclosed. The environment 700 includes at least one userdevice 702, 706. The user device 702 can be a mobile phone, tablet,workstation, desktop computer, laptop or any other type of user devicerunning an application 704. The user device 702 can be an automobile 706or any other combination of hardware and software that is running anapplication 704.

The user devices 702, 706 are connected to one or more communicationnetworks 708 that allow for communication between various components ofthe environment 700 and that allow for performing of searches on theinternet or other networks. In one implementation, the communicationnetworks 708 include the internet. The communication networks 708 alsocan utilize dedicated or private communication links that are notnecessarily part of the public internet. In one implementation thecommunication networks 708 use standard communication technologies,protocols, and/or inter-process communication technologies. The userdevices 702, 706 are capable of receiving, for example, a first query ina first language, where the purpose of the query is to perform a searchon the internet or a private network. The application 704 is implementedon the user devices 702, 706 to capture the first query.

The environment 700 also includes applications 710 that can bepreinstalled on the user devices 702, 706 or updated/installed on theuser devices 702, 706 over the communications networks 708.Additionally, the environment 700 includes Application ProgrammingInterfaces (APIs) 711 that can also be preinstalled on the user devices702, 706 or updated/installed on the user devices 702, 706 over thecommunications networks 708. The APIs 711 can be implemented to allowthe user devices 702, 706 and the applications 710 to easily gain accessto other components on the environment 700 as well as certain privatenetworks.

The environment 700 also includes an interpreter 712 that can be runningon one or more platforms/servers that are part of a speech recognitionsystem. The interpreter 712 can be a single computing device (e.g., aserver), a cloud computing device, or it can be any combination ofcomputing device, cloud computing devices, etc., that are capable ofcommunicating with each other to perform the various tasks required toperform meaningful interpretation, as well as speech recognition, ifdesired. The interpreter 712 can include a deep learning system 714 thatis capable of using artificial intelligence, neural networks, and ormachine learning to perform interpretations. The deep learning 714 canimplement language embedding(s), such as a model or models 716 as wellas a natural language domain 718 for providing domain-specifictranslations and interpretations for natural language processing (NLP).

Since the interpreter 712 can be spread over multiple servers and/orcloud computing device, the operations of the deep learning 714, thelanguage embedding(s) 716 and the natural language domains 718 can alsobe spread over multiple servers and/or cloud computing device. Theapplications 710 can be used by and/or in conjunction with theinterpreter 712 to translate spoken input, as well as text input andtext file input. Again, the various components of the environment 700can communicate (exchange data) with each other using customized APIs711 for security and efficiency. The interpreter 712 is capable ofinterpreting a query or statement (e.g., natural language utterance)obtained from the user devices 702, 706.

The user devices 702, 706 and the interpreter 712 can each includememory for storage of data and software applications, a processor foraccessing data in executing applications, and components that facilitatecommunication over the communications networks 708. The user devices702, 706 execute applications 704, such as web browsers (e.g., a webbrowser application 704 executing on the user device 702), to allowdevelopers to prepare and submit applications 710 and allow users tosubmit speech audio queries (e.g., the speech input 102 and query 103 ofFIG. 1) including natural language utterances to be interpreted by theinterpreter 712.

As mentioned above, the interpreter 712 can implement one or morelanguage embeddings (models) 716 from a repository of embeddings(models) (not illustrated) that are created and trained using thetechniques that are known to a person of ordinary skill in the art.

As also mentioned above, the natural language domain 718 can beimplemented by the interpreter 712 in order to add context or realmeaning to the transcription of the received speech input.

The environment 700 can further include a topic analyzer 720 that canimplement one or more topic models 722 to analyze and determine a topicof a query or statement. Some of the operations of the topic analyzer720 could be performed during, for example, transcription operation 106of FIG. 1.

Furthermore, the environment 700 can include a disambiguator 724 that isable to utilize any type of external data 726 (e.g., disambiguationinformation) in order to add further meaning to an obtained query.Essentially, the disambiguator 724 is able to add further meaning to aquery or statement by analyzing previous searches of the user, profiledata of the user, location information, calendar information, date andtime information, etc. For example, the disambiguator 724 can be used toadd synonyms to the initial search that can be helpful to narrow thesearch to what the user wants to find. The disambiguator 724 can alsoadd additional limits to the search, such as certain dates and/ortimeframes (e.g., based on the travel plans of the user additionallimits can be added to the original query to identify events that areoccurring while the user is traveling to a certain region).

For example, if the query 103 obtained by one of the user devices 702,706 is “How long do I cook lasagna?” the topic analyzer 720 can analyzethe query and determine that the topic (or domain) is “memo.cooking”.The disambiguator 724 can use the external data 726 to determine thatthe user has been cooking at their mother's house for the past few days.Accordingly, the disambiguator 724 can extend the terms of the firstquery from “How long do I cook lasagna?” to “How long do I cook lasagnaat my mother's house?” Prior to extending the query, the system can askthe user if they are cooking at their home or at their mother's house.In other words, the combination of the results obtained by the topicanalyzer 720 and the disambiguator 724 can essentially narrow the scopeof the query. The disambiguator 724 can also use other mechanisms toextend the keywords of the received queries. This can be done by askingthe user broad or specific questions regarding their initial query orcan simply be done using artificial intelligence or other means to beable to further narrow the initial query.

Regardless of whether the topic analyzer 720 and/or the disambiguator724 are implemented to change the scope of any of the queries orstatements, a searcher 732 of the environment 700 is implemented toperform a search for a memo or favorite information based on the queryto obtain language. The searcher 732 can implement language and domaindata 734 to determine which domains should be searched.

The searcher 732 can, for example, identify a domain for a query independence upon at least one of a wake phrase, a trigger phrase, thecontents or topic of the query, as determined by the topic analyzer 720.The searcher 732 is not limited to searching just a single domain. Thesearcher 732 can search multiple domains in parallel or in series. Forexample, if an insufficient number of results are found after searchingin the first domain (e.g., the memo domain) a second domain (e.g.,favorites) may be searched.

Various scoring techniques can be implemented which will be understoodby one of ordinary skill in the art. Further, the user may have theoption to select various scoring and ranking techniques to beimplemented. For example, the user may select to have scoring andranking independently implemented (and presented) for each domain. Thescorer/ranker 730 may only present the top X results or a top Ypercentage of results so as to not overwhelm the user.

Whether the results are presented in speech or text, the technologydisclosed can also provide a brief visual or auditory summary of eachresult, making it easier for the user to determine which results theywould like to view first.

The interpreter 712, topic analyzer 720, disambiguator 724,scorer/ranker 730 and/or the searcher 732 can be implemented using atleast one hardware component and can also include firmware, or softwarerunning on hardware. Software that is combined with hardware to carryout the actions of the interpreter 712, topic analyzer 720,disambiguator 724, scorer/ranker 730 and/or the searcher 732 can bestored on computer readable media such as rotating or non-rotatingmemory. The non-rotating memory can be volatile or non-volatile. In thisapplication, computer readable media does not include a transitoryelectromagnetic signal that is not stored in a memory; computer readablemedia store program instructions for execution. The interpreter 712,topic analyzer 720, disambiguator 724, scorer/ranker 730 and/or thesearcher 732, as well as the applications 710, the topic models, 722,external data 726, the language and domain data 734 and the APIs 711 canbe wholly or partially hosted and/or executed in the cloud or by otherentities connected through the communications network 708.

FIG. 8 is a block diagram of an example computer system that canimplement various components of the environment 700 of FIG. 7. Computersystem 810 typically includes at least one processor 814, whichcommunicates with a number of peripheral devices via bus subsystem 812.These peripheral devices may include a storage subsystem 824, comprisingfor example memory devices and a file storage subsystem, user interfaceinput devices 822, user interface output devices 820, and a networkinterface 815. The input and output devices allow user interaction withcomputer system 810. Network interface 815 provides an interface tooutside networks, including an interface to the communication networks708, and is coupled via the communication networks 708 to correspondinginterface devices in other computer systems.

User interface input devices 822 may include audio input devices such asspeech recognition systems, microphones, and other types of inputdevices. In general, use of the term “input device” is intended toinclude all possible types of devices and ways to input speechinformation into computer system 810 or onto communication network 708.

User interface output devices 820 may include a display subsystem, aprinter, a fax machine, or non-visual displays such as audio outputdevices. The display subsystem may include a cathode ray tube (CRT), aflat-panel device such as a liquid crystal display (LCD), a projectiondevice, or some other mechanism for creating a visible image. Thedisplay subsystem may also provide non-visual display such as via audiooutput devices. In general, use of the term “output device” is intendedto include all possible types of devices and ways to output informationfrom computer system 810 to the user or to another machine or computersystem.

Storage subsystem 824 stores programming and data constructs thatprovide the functionality of some or all of the modules describedherein. These software modules are generally executed by processor 814alone or in combination with other processors.

Memory subsystem 825 used in the storage subsystem can include a numberof memories including a main random-access memory (RAM) 830 for storageof instructions and data during program execution and a read only memory(ROM) 832 in which fixed instructions are stored. A file storagesubsystem 828 can provide persistent storage for program and data files,and may include a hard disk drive, a floppy disk drive along withassociated removable media, a CD-ROM drive, an optical drive, orremovable media cartridges. The modules implementing the functionalityof certain embodiments may be stored by file storage subsystem 828 inthe storage subsystem 824, or in other machines accessible by theprocessor.

Bus subsystem 812 provides a mechanism for letting the variouscomponents and subsystems of computer system 810 communicate with eachother as intended. Although bus subsystem 812 is shown schematically asa single bus, alternative embodiments of the bus subsystem may usemultiple busses.

Computer system 810 can be of varying types including a workstation,server, computing cluster, blade server, server farm, or any other dataprocessing system or computing device. Due to the ever-changing natureof computers and networks, the description of computer system 810depicted in FIG. 8 is intended only as a specific example for purposesof illustrating the various embodiments. Many other configurations ofcomputer system 810 are possible having more or fewer components thanthe computer system depicted in FIG. 8.

Some Particular Implementations

We describe various implementations of retrieving a personal memo from adatabase and storing a memo in a database.

The technology disclosed can be practiced as a system, method, orarticle of manufacture. One or more features of an implementation can becombined with the base implementation. Implementations that are notmutually exclusive are taught to be combinable. One or more features ofan implementation can be combined with other implementations. Thisdisclosure periodically reminds the user of these options. Omission fromsome implementations of recitations that repeat these options should notbe taken as limiting the combinations taught in the precedingsections—these recitations are hereby incorporated forward by referenceinto each of the following implementations.

A method implementation of the technology disclosed includes a method ofretrieving a personal memo from a database. The method includesreceiving, by a virtual assistant, a natural language utterance thatexpresses a request, interpreting the natural language utteranceaccording to a natural language grammar rule for retrieving memo datafrom the natural language utterance, the natural language grammar rulerecognizing query information, responsive to interpreting the naturallanguage utterance, using the query information to query the databasefor a memo related to the query information, and providing, to a user, aresponse generated in dependence upon the memo related to the queryinformation.

According to an implementation, the natural language grammar rule forretrieving memo data is selected from a plurality of domain dependentgrammar rules in accordance to contents of the received natural languageutterance.

In another implementation, the database is queried for the memo relatedto the query information by searching the database to identify any memothat includes information sufficient to provide an appropriate responseto the user.

In an implementation the response is provided to the user, such that theresponse answers the request expressed by the natural language utteranceas opposed to providing a word-for-word repeat of a transcription.

A further implementation includes identifying a trigger phrase from thereceived natural language utterance, and responsive to identifying thetrigger, selecting the natural language grammar rule for retrieving memodata in dependence upon at least one of (i) the identified triggerphrase and (ii) other contents of the natural language utterance.

In an implementation the trigger phrase includes both a personal pronounfollowed by an interrogative pronoun or a relative pronoun that iswithin 5 words of the personal pronoun.

In a different implementation the method can include receiving anindication that the user spoke a memo-specific wake phrase before thenatural language utterance.

In a further implementation the database storing the memo is astructured database, such that the memo is stored in a structuredformat, and in another implementation the database storing the memo isan unstructured database, such that the memo is stored in anunstructured format.

In one implementation the method includes receiving, from the user, anatural language utterance including memo information, interpreting thenatural language utterance to extract the memo information, and storingthe memo information in the database as a memo.

Another implementation includes the stored interpretation of the naturallanguage utterance including the memo information includes personalinformation about the user.

Moreover, an implementation can include receiving, interpreting andstoring multiple natural language utterances including the memoinformation as memos that relate to a subject along with additionalinformation indicating a time-order of being received, and generatingthe response in dependence upon a stored memo (i) relating to thesubject and (ii) that was interpreted from a most recently receivednatural language utterance including the memo information relating tothe subject.

Another implementation may include replacing other previously storedmemos that relate to a subject with a most recently stored memo thatrelates to the subject when multiple natural language utterancesincluding the memo information are received, interpreted and stored inthe database as a memo that relates to a subject.

According to one implementation, the method includes allowing the userto confirm or acknowledge whether or not the user intended for thenatural language utterance including the memo information to be storedas the memo.

According to a further implementation, the method includes deleting thestored memo related to the natural language utterance including the memoinformation when the user indicates that that natural language utteranceincluding the memo information was not intended to be stored as thememo.

According to another implementation, the method includes assigning atime period to the memo, after which the memo will expire, and removingthe memo from the database when the time period has expired.

An implementation may also include interpreting the natural languageutterance that expresses the request according to multiple domains, eachdomain of the multiple domains having an associated relevancy score forthe interpreted utterance, wherein a memo domain is one of the multipledomains, and wherein the memo domain has a score advantage relative toother domains.

Additionally, according to one implementation the method may includestoring a recording of the natural language utterance that expresses therequest and/or storing a recording the natural language utteranceincluding the memo information.

According to an implementation a first particular interpretation of thetranscription of text is stored in the database in association with afirst domain and a second particular interpretation of the transcriptionis stored in the database in association with the second domain, suchthat two or more interpretations stored in the database.

One implementation may include storing meta-data along with the memo,where the meta-data include information such as short-term activityinformation, daily weather information, until-event occurs information,and where the meta-data can be explicitly stated by the user or inferredfrom other information including other memos, regular commuteinformation and/or calendar information.

Other implementations may include a non-transitory computer-readablerecording medium having a computer program for retrieving a personalmemo form a database recorded thereon. The computer program, whenexecuted on one or more processors, causes the processors to perform themethod described above and any of the above-described implementations.Specifically, includes receiving, by a virtual assistant, a naturallanguage utterance that expresses a request, interpreting the naturallanguage utterance according to a natural language grammar rule forretrieving memo data from the natural language utterance, the naturallanguage grammar rule recognizing query information, responsive tointerpreting the natural language utterance, using the query informationto query the database for a memo related to the query information, andproviding, to a user, a response generated in dependence upon the memorelated to the query information.

Each of the features discussed in this particular implementation sectionfor the first system implementation apply equally to the CRMimplementation. As indicated above, all the system features are notrepeated here and should be considered repeated by reference.

A system implementation of the technology disclosed includes one or moreprocessors coupled to memory. The memory is loaded with computerinstructions to retrieve a personal memo from a database. Theinstructions, when executed on the one or more processors, implementactions including includes receiving, by a virtual assistant, a naturallanguage utterance that expresses a request, interpreting the naturallanguage utterance according to a natural language grammar rule forretrieving memo data from the natural language utterance, the naturallanguage grammar rule recognizing query information, responsive tointerpreting the natural language utterance, using the query informationto query the database for a memo related to the query information, andproviding, to a user, a response generated in dependence upon the memorelated to the query information.

This system implementation and other systems disclosed optionallyinclude one or more of the following features. System can also includefeatures described in connection with methods disclosed. In the interestof conciseness, alternative combinations of system features are notindividually enumerated. Features applicable to systems, methods, andarticles of manufacture are not repeated for each statutory class set ofbase features. The reader will understand how features identified inthis section can readily be combined with base features in otherstatutory classes.

A given event or value is “responsive” (e.g., “in response to” or“responsive to”) to a predecessor event or value if the predecessorevent or value influenced the given event or value. If there is anintervening processing element, step or time period, the given event orvalue can still be “responsive” to the predecessor event or value. Ifthe intervening processing element or step combines more than one eventor value, the signal output of the processing element or step isconsidered “responsive” to each of the event or value inputs. If thegiven event or value is the same as the predecessor event or value, thisis merely a degenerate case in which the given event or value is stillconsidered to be “responsive” to the predecessor event or value.“Dependency” (e.g. “in dependence upon” or “in dependence on”) of agiven event or value upon another event or value is defined similarly.

We claim as follows:
 1. A computer-implemented method comprising:receiving commands to store memos; identifying subjects related to thememos; storing, in a database, the memos, their related subjects, andassociated time information; receiving a natural language request toretrieve a memo, the request having query information; identifying asubject related to the request; responsive to the request, querying thedatabase for memos related to the subject; identifying multiple memos inresponse to the database query; identifying a memo, from the multipleidentified memos, that has the most recent associated time information;and providing a response in dependence on the identified memo.
 2. Themethod of claim 1, wherein the natural language request is parsedaccording to a grammar rule for retrieving memos.
 3. The method of claim2, further comprising: identifying a trigger phrase from the receivednatural language request; and selecting the grammar rule in dependenceupon the identified trigger phrase.
 4. The method of claim 1, whereinthe database storing the memo is a structured database, such that thememo is stored in a structured format.
 5. The method of claim 1, whereinthe database storing the memo is an unstructured database, such that thememo is stored in an unstructured format.
 6. The method of claim 1,further comprising removing memos from the database after apredetermined time period.
 7. A non-transitory computer-readablerecording medium having a computer program recorded thereon, thecomputer program, when executed on one or more processors, causing theprocessors to perform a method comprising: receiving commands to storememos; identifying subjects related to the memos; storing, in adatabase, the memos, their related subjects, and associated timeinformation; receiving a natural language request to retrieve a memo,the request having query information; identifying a subject related tothe request; responsive to the request, querying the database for memosrelated to the subject; identifying multiple memos in response to thedatabase query; identifying a memo, from the multiple identified memos,that has the most recent associated time information; and providing aresponse in dependence on the identified memo.
 8. The non-transitorycomputer-readable recording medium of claim 7, wherein the naturallanguage request is parsed according to a grammar rule for retrievingmemos.
 9. The non-transitory computer-readable recording medium of claim8, wherein the method further comprises: identifying a trigger phrasefrom the received natural language request; and selecting the grammarrule in dependence upon the identified trigger phrase.
 10. Thenon-transitory computer-readable recording medium of claim 7, whereinthe database storing the memo is a structured database, such that thememo is stored in a structured format.
 11. The non-transitorycomputer-readable recording medium of claim 7, wherein the databasestoring the memo is an unstructured database, such that the memo isstored in an unstructured format.
 12. The non-transitorycomputer-readable recording medium of claim 7, wherein the methodfurther comprises removing memos from the database after a predeterminedtime period.
 13. A system including one or more processors coupled tomemory, the memory being loaded with computer instructions, the computerinstructions, when executed on the one or more processors, causing theone or more processors to implement actions comprising: receivingcommands to store memos; identifying subjects related to the memos;storing, in a database, the memos, their related subjects, andassociated time information; receiving a natural language request toretrieve a memo, the request having query information; identifying asubject related to the request; responsive to the request, querying thedatabase for memos related to the subject; identifying multiple memos inresponse to the database query; identifying a memo, from the multipleidentified memos, that has the most recent associated time information;and providing a response in dependence on the identified memo.
 14. Thesystem of claim 13, wherein the natural language request is parsedaccording to a grammar rule for retrieving memos.
 15. The system ofclaim 14, wherein the actions further comprise: identifying a triggerphrase from the received natural language request; and selecting thegrammar rule in dependence upon the identified trigger phrase.
 16. Thesystem of claim 13, wherein the database storing the memo is astructured database, such that the memo is stored in a structuredformat.
 17. The system of claim 13, wherein the database storing thememo is an unstructured database, such that the memo is stored in anunstructured format.
 18. The system of claim 13, wherein the actionsfurther comprise removing memos from the database after a predeterminedtime period.